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' Supervised relation extraction uses a pre-defined schema of relation types (such as born-in or 

employ ed-by). This approach requires labeling textual relations, a time-consuming and difficult 
Q ■ process. This has led to significant interest in distantly-supervised learning. Here one aligns exist- 

ing database records with the sentences in which these records have been "rendered", and from this 
labeling one can train a machine learning system as before 0]|2]. However, this method relies on 
the availability of a large database that has the desired schema. 

The need for pre-existing databases can be avoided by not having any fixed schema. This is the 
approach taken by OpenlE J3J. Here surface patterns between mentions of concepts serve as rela- 
tions. This approach requires no supervision and has tremendous flexibility, but lacks the ability to 
generalize. For example, OpenlE may find FERGUSON-Zi/sfonan-af-HARVARD but does not know 
FERGUSON-i's-a-pra/essor-af-HARVARD. 

^C} . One way to gain generalization is to cluster textual surface forms that have similar meaning |4] [5] 

|6]|2l. While the clusters discovered by all these methods usually contain semantically related items, 
closer inspection invariably shows that they do not provide reliable implicature. For example, a 
cluster may include historian-at, professor-at, scientist-at, worked-at. However, scientist-at does 
not necessarily imply professor-at, and worked-at certainly does not imply scientist-at. In fact, we 
contend that any relational schema would inherently be brittle and ill-defined — having ambiguities, 
problematic boundary cases, and incompleteness. 

In response to this problem, we present a new approach: implicature with universal schemas. Here 
we embrace the diversity and ambiguity of original inputs. This is accomplished by defining our 
schema to be the union of all source schemas: original input forms, e.g. variants of surface patterns 
similarly to OpenlE, as well as relations in the schemas of pre-existing structured databases. But 
unlike OpenlE, we learn asymmetric implicature among relations and entity types. This allows us 
to probabilistically "fill in" inferred unobserved entity-entity relations in this union. For example, 
after observing FERGUSON— /z/sfor/an-af— HARVARD, our system infers that FERGUSON-professor- 
af— Harvard, but not vice versa. 

At the heart of our approach is the hypothesis that we should concentrate on predicting source 
data — a relatively well defined task that can be evaluated and optimized — as opposed to modeling 
semantic equivalence, which we believe will always be illusive. 

To reason with a universal schema, we learn latent feature representations of relations, tuples and en- 
tities. These act, through dot products, as natural parameters of a log-linear model for the probability 
that a given relation holds for a given tuple. We show experimentally that this approach significantly 
outperforms a comparable baseline without latent features, and the current state-of-the-art distant 
supervision method. 
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2 Model 



We use 1Z to denote the set of relations we seek to predict (such as works-written in Freebase, or 
the X-heads-Y pattern), and T to denote the set of input tuples. For simplicity we assume each 
relation to be binary. Given a relation r 6 1Z and a tuple t € T the pair (r, t) is a fact, or relation 
instance. The input to our model is a set of observed facts , and the observed facts for a given tuple 

f-={(r,t)e}. 

Our goal is a model that can estimate, for a given relation r (such as X-historian-at-Y) and a given 
tuple t (such as <Ferguson,Harvard>) a score c T: t for the fact (r,t). This matrix completion 
problem is related to collaborative filtering. We can think of each tuple as a customer, and each 
relation as a product. Our goal is to predict how the tuple rates the relation (rating = false, rating 1 
= true), based on observed ratings in . We interpret c r j as the probability p (y r t = 1) where y r t is 
a binary random variable that is true iff (r, t) holds. To this end we introduce a series of exponential 
family models inspired by generalized PCA [8 1, a probabilistic generalization of Principle Compo- 
nent Analysis. These models will estimate the confidence in (r, t) using a natural parameter 9 r> t 
and the logistic function: c r>i := p (y r ,t\0r,t) : = 1+cxp (_g rt) - 

We follow|9| and use a ranking based objective function to estimate parameters of our models. 

Latent Feature Model One way to define 9 rA is through a latent feature model F. We measure 
compatibility between relation r and tuple t as a dot product of two latent feature representations of 

size K F : a r for relation r, and Vt for tuple t. This gives 9 ¥ r t := Y^/k a r,kVt,k and corresponds to 
the original generalized PCA that learns a low -rank factorization of 9 — (9 r ,t)- 

Neighborhood Model We can interpolate the confidence for a given tuple and relation based on 
the trueness of other similar relations for the same tuple. In Collaborative Filtering this is referred as 
a neighborhood-based approach iflOl . We implement a neighborhood model N via a set of weights 
w r y, where each corresponds to a directed association strength between relations r and r'. Sum- 
ming these up gives 9^ t := J2 r >e t \{r} w r,r'^\ 

Entity Model Relations have selectional preferences: they allow only certain types in their ar- 
gument slots. To capture this observation, we learn a latent entity representation from data. For 
each entity e we introduce a latent feature vector t e 6 R l . In addition, for each relation r and 
argument slot i we introduce a feature vector d^. Measuring compatibility of an entity tuple and 
relation amounts to summing up the compatibilities between each argument slot representation and 

the corresponding entity representation: 9f t := di,khi,k- 

Combined Models In practice all the above models can capture important aspects of the data. 
Hence we also use various combinations, such as 9^f' E := 9^ t + 9 ¥ rt + 9f t . 



3 Experiments 

Does reasoning jointly across a universal schema help to improve over more isolated approaches? 
In the following we seek to answer this question empirically. 

Data Our experimental setup is roughly equivalent to previous work J2J, and hence we omit de- 
tails. To summarize, we consider each pair (iijia) of Freebase entities that appear together in a 
corpus. Its set of observed facts t correspond to: Extracted surface patterns (in our case lexicalized 
dependency paths) between mentions of t\ and t^, and the relations of t\ and t?. in Freebase. We 
divide all our tuples into approximately 200k training tuples, and 200k test tuples. The total number 
of relations (patterns and from Freebase) is approximately 4k. 



'Notice that the neighborhood model amounts to a collection of local log-linear classifiers, one for each 
relation r with weights w r . 
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Predicting Freebase and Surface Pattern Relations For evaluation we use two collections of 
relations: Freebase relations and surface patterns. In either case we compare the competing systems 
with respect to their ranked results for each relation in the collection. 

Our first baseline is MI09, a distantly supervised classifier based on the work of [1|. We also 
compare against YAH, a version of MI09 that uses preprocessed pattern cluster features according 
to 0. The third baseline is SU12, the state-of-the-art Multi-Instance Multi-Label system by IfTTII . 
The remaining systems are our neighborhood model (N), the factorized model (F), their combination 
(NF) and the combined model with a latent entity representation (NFE). 

The results in terms of mean average precision (with respect to pooled results from each system) are 
in the table below: 



Relation 


# 


MI09 


YAH 


SU12 


N 


F 


NF 


NFE 


Total Freebase 
Total Pattern 


334 
329 


0.48 


0.52 


0.57 


0.52 
0.28 


0.66 
0.56 


0.67 
0.50 


0.69 
0.46 



For Freebase relations, we can see that adding pattern cluster features (and hence incorporating more 
data) helps YAH to improve over MI09. Likewise, we see that the factorized model F improves 
over N, again learning from unlabeled data. This improvement is bigger than the corresponding 
change between MI09 and YAH, possibly indicating that our latent representations are optimized 
directly towards improving prediction performance. Our best model, the combination of N, F and E, 
outperforms all other models in terms of total MAP, indicating the power of selectional preferences 
learned from data. 

MI09, YA1 1 and SU12 are designed to predict structured relations, and so we omit them for results 
on surface patterns. Look at our models for predicting tuples of surface patterns. We again see that 
learning a latent representation (F, NF and NFE models) from additional data helps substantially 
over the non-latent N model. 

All our models are fast to train. The slowest model trains in just 30 minutes. By contrast, training 
the topic model in YAH alone takes 4 hours. Training SU12 takes two hours (on less data). Also 
notice that our models not only learn to predict Freebase relations, but also approximately 4k surface 
pattern relations. 

4 Conclusion 

We represent relations using universal schemas. Such schemas contain surface patterns as relations, 
as well as relations from structured sources. We can predict missing tuples for surface pattern rela- 
tions and structured schema relations. We show this experimentally by contrasting a series of popular 
weakly supervised models to our collaborative filtering models that learn latent feature representa- 
tions across surface patterns and structured relations. Moreover, our models are computationally 
efficient, requiring less time than comparable methods, while learning more relations. 

Reasoning with universal schemas is not merely a tool for information extraction. It can also serve 
as a framework for various data integration tasks, for example, schema matching. In future work we 
also plan to integrate universal entity types and attributes into the model. 
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